[GLUTEN-10992][VL] Fix MatchError for KeyGroupedPartitioning in native shuffle by brijrajk · Pull Request #12335 · apache/gluten

brijrajk · 2026-06-22T19:05:40Z

What changes were proposed in this pull request?

When Spark 4.0's V2 bucketing shuffle (spark.sql.v2.bucketing.shuffle.enabled=true) is used in a join where only one side reports partitioning, Spark generates a ShuffleExchangeExec with KeyGroupedPartitioning as its output partitioning.

The default case _ => in VeloxSparkPlanExecApi.genColumnarShuffleExchange created a ColumnarShuffleExchangeExec for this node without validation. When the query executed, ExecUtil.genShuffleDependency crashed with a scala.MatchError because KeyGroupedPartitioning was missing from its exhaustive match.

Changes:

VeloxSparkPlanExecApi.genColumnarShuffleExchange: add an explicit case _: KeyGroupedPartitioning => before the default that adds a fallback tag and returns the vanilla ShuffleExchangeExec. This prevents a ColumnarShuffleExchangeExec from being created for an unsupported partitioning type.
ExecUtil.genShuffleDependency: add an explicit wildcard case other => that throws GlutenNotSupportException instead of the cryptic scala.MatchError, as a defensive guard for any future unknown partitioning types.

How was this patch tested?

The existing testGluten("SPARK-41471: shuffle one side: only one side reports partitioning") tests in GlutenKeyGroupedPartitioningSuite (both spark40 and spark41) reproduce the crash exactly — they set V2_BUCKETING_SHUFFLE_ENABLED=true with only one bucketed side, which triggers a ShuffleExchangeExec with KeyGroupedPartitioning output and then call checkAnswer. After this fix these tests pass without MatchError.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (https://claude.ai/code)

Related issue: #10992

liuneng1994 · 2026-06-27T07:12:38Z

Can you please add some tests for KeyGroupedPartitioning

Copilot

Pull request overview

Fixes a Spark 4.0 native-shuffle crash when Spark produces a ShuffleExchangeExec with KeyGroupedPartitioning (e.g., with V2 bucketing shuffle enabled and only one join side reporting partitioning). The change prevents Gluten from creating a native ColumnarShuffleExchangeExec for an unsupported partitioning type and replaces a runtime scala.MatchError with a clearer Gluten exception.

Changes:

Add an explicit KeyGroupedPartitioning fallback in VeloxSparkPlanExecApi.genColumnarShuffleExchange to return vanilla ShuffleExchangeExec (tagged for fallback) instead of creating ColumnarShuffleExchangeExec.
Add a defensive default case other => in ExecUtil.genShuffleDependency to throw GlutenNotSupportException rather than a scala.MatchError for unknown partitioning types.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
backends-velox/src/main/scala/org/apache/spark/sql/execution/utils/ExecUtil.scala	Adds a default match case to fail fast with `GlutenNotSupportException` instead of `MatchError` for unsupported/unknown partitioning types.
backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxSparkPlanExecApi.scala	Adds an explicit `KeyGroupedPartitioning` fallback path to avoid creating native columnar shuffle for an unsupported partitioning.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+      case other =>
+        throw new GlutenNotSupportException(
+          s"Partitioning ${other.getClass.getSimpleName} is not supported by native shuffle")


…e shuffle When Spark 4.0's V2 bucketing shuffle (spark.sql.v2.bucketing.shuffle.enabled=true) is used in a join where only one side reports partitioning, Spark generates a ShuffleExchangeExec with KeyGroupedPartitioning as its output. The default case in VeloxSparkPlanExecApi.genColumnarShuffleExchange created a ColumnarShuffleExchangeExec for this node, which then crashed with a scala.MatchError in ExecUtil.genShuffleDependency because KeyGroupedPartitioning was not handled in the native partitioning match. Fix by adding an explicit KeyGroupedPartitioning case to genColumnarShuffleExchange that marks the shuffle for fallback to vanilla Spark. Also harden ExecUtil.genShuffleDependency with an explicit wildcard that throws GlutenNotSupportException instead of a cryptic MatchError for any future unknown partitioning types. The exception now embeds the full partitioning toString (expressions, numPartitions) to aid debugging. Add a dedicated GlutenKeyGroupedPartitioningSuite test (spark40 and spark41) that asserts the KeyGroupedPartitioning shuffle falls back to a vanilla ShuffleExchangeExec and is never offloaded to ColumnarShuffleExchangeExec. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

brijrajk · 2026-06-27T13:26:22Z

@liuneng1994 thanks for the review. Both comments are addressed in the latest push:

Tests: added a dedicated GlutenKeyGroupedPartitioningSuite test (spark40 and spark41) that asserts the KeyGroupedPartitioning shuffle falls back to a vanilla ShuffleExchangeExec and is never offloaded to ColumnarShuffleExchangeExec, then checks the join result. This complements the existing SPARK-41471 one-side tests by verifying the fallback path specifically rather than just the shuffle count.
Error message (Copilot suggestion): the GlutenNotSupportException in ExecUtil.genShuffleDependency now embeds the full partitioning toString (expressions, numPartitions) instead of only the class name.

github-actions Bot added the VELOX label Jun 22, 2026

liuneng1994 requested review from Copilot and liuneng1994 June 27, 2026 07:11

Copilot started reviewing on behalf of liuneng1994 June 27, 2026 07:11 View session

Copilot AI reviewed Jun 27, 2026

View reviewed changes

Comment thread backends-velox/src/main/scala/org/apache/spark/sql/execution/utils/ExecUtil.scala Outdated

Comment on lines +176 to +178

case other =>

throw new GlutenNotSupportException(

s"Partitioning ${other.getClass.getSimpleName} is not supported by native shuffle")

brijrajk force-pushed the fix/10992-keygrouped-partitioning-fallback branch from 761cfc7 to 37d4051 Compare June 27, 2026 13:25

github-actions Bot added the CORE works for Gluten Core label Jun 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GLUTEN-10992][VL] Fix MatchError for KeyGroupedPartitioning in native shuffle#12335

[GLUTEN-10992][VL] Fix MatchError for KeyGroupedPartitioning in native shuffle#12335
brijrajk wants to merge 1 commit into
apache:mainfrom
brijrajk:fix/10992-keygrouped-partitioning-fallback

brijrajk commented Jun 22, 2026 •

edited by github-actions Bot

Loading

Uh oh!

liuneng1994 commented Jun 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

brijrajk commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

brijrajk commented Jun 22, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

liuneng1994 commented Jun 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

brijrajk commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brijrajk commented Jun 22, 2026 •

edited by github-actions Bot

Loading